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X supports what is called the graphical user interface (GUI): the user sees a 
graphical output and the user can input data via a keyboard and a pointing device 
such as a mouse. In order to run the numerous existing utilities and applications 
based on the character user interface (CUI) in X, an X client program called xterm 
was released with X version 10, the first public version of X. The program xterm 
emulates an ASCII character terminal in X. It behaves like a hard-wired character- 
based terminal, except that resizing the xterm window is allowed and the column 
and/or row numbers of the emulated terminal will change accordingly. However, 
like most hard-wired terminals, xterm cannot cope with multi-byte international 
character sets. 

In this paper, we describe exterm, a Chinese terminal emulation program for X, 
which enables the display and input of Chinese characters. For brevity, we shall use 
the Chinese 'pronunciation spelling’ hanzi to stand for Chinese character(s). Note 
that hanzi is used as both a singular and a plural noun. 

In the following sections, we describe the problem of computing involving multi- 
bvte international characters. In particular, we concentrate on the treatment of 
hanzi. We describe the difficulties encountered and the important design decisions 
in developing exterm. A significant feature of exterm is that different hanzi input 
methods are supported; the addition of a new input method does not require any 
change in the exterm program. Finally, we discuss related works in terminal emulation 
for multi-byte character computing, and the extension of exterm to input Chinese 
phrases. 

MULTI-BYTE CHARACTER REPRESENTATIONS 

Computers traditionally process characters represented in single bytes only, such 
as EBCDIC or ASCII. Nowadays eight-bit European language character sets are 
commonly supported by vendors. For the Chinese, Japanese or Korean languages, 
the character sets contain thousands of logo-syllabic characters that are mainly 
originated from Chinese characters. A code of more than one byte is used to 
represent each character. The corresponding national standard bodies have defined 
standard character sets that contain the commonly used characters of the respective 
languages. 

For the Chinese language, the most widely used official standard character set in 
Mainland China is GB2312-80 (GB for short). 5 It contains 6763 simplified hanzi* 
each represented by a two-byte code of the form Oxxxxxxx Oxxxxxxx (where x can 
be either 0 or 1). In Taiwan and Hong Kong, most commercial programs use the 
so-called Big-5 character set. 6 It contains over 13,000 traditional hanzi , each rep¬ 
resented by a two-byte code of the form Oxxxxxxx xxxxxxxx. 

A common way to represent more than one character set in a stream of bytes is 
to use an escape character sequence to signify the start of another character set, and 
sometimes, also to signify the end of a character set. This is the scheme commonly 
used in Japanese language processing software, 7 where the Japanese characters are 
represented in ASCII, kana and kanjit character sets. 

• Currently, simplified hanzi is used daily in Mainland China while traditional hanzi is used in Taiwan and Hong 
Kong. However, even in Mainland China, knowledge of traditional hanzi is still required to study pre-1957 writings. 
(The first standard on written simplified hanzi was introduced in 1957.) 

t Kanji is the collective term for Chinese characters in the Japanese language. The number of kanji in Japanese 
is much less than the number of hanzi in Chinese. 
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seven-bit character set (ASCII) > high-bit) of the first byte of the 
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table. The input processing module then displays the hanzi in the entry as a choice- 
list of candidate hanzi for the user to choose among. 

Commonly used input methods are based on the pronunciation (or pinyin) or the 
radical structure (the shape) of hanzi, or both. Although there are many different 
input methods, none of them has gained universal acceptance, and many more are 
beins developed. sv A Chinese input processing software module should therefore 
be adaptable to support different input methods, since different people may favour 
different input methods and since new input methods may be proposed. 

Chinese is an ideographic language. For hanzi output, we need a way to draw the 
character image of any hanzi code on the screen. The widespread use of workstations 
with bitmap display hardware and graphics software solves this problem. Bitmaps 
of hanzi character images can be put into a font file, and today's workstations already 
support multiple character fonts. As the X window system becomes more and more 
popular, it is desirable to have Chinese processing software in X." 1 

Our first attempt* to add Chinese processing capability to X and to develop exterm 
was based on X10 (X version 10). However, X10 only supposed eight-bit character 
fonts. A brute force approach is to let the first byte of a two-byte code identify a 
font and the second byte identify the character image within that font. This would 
require a large number of fonts, and consequently the performance of X would 
seriously degrade when there is a lot of switching between fonts. Instead, we decided 
to enhance the X protocol 4 by adding special X library routines and corresponding 
program code in the X server to open font files of two-bvte hanzi code and to 
retrieve the character image directly. The use of such files did not follow the X10 
font standard as there was no X10 font standard for 16-bit characters. 

Soon after our effort on the display of hanzi in X10 was completed, X version 11 
was announced. XI1 supports 16-bit character fonts. This simplifies the task of 
exterm to display a hanzi. We can simply use the two-byte code of a hanzi as an 
index to retrieve its image in any 16-bit hanzi font. 


APPROACHES TO HANZI INPUT PROCESSING IN X 

XI1 makes hanzi output as easy to produce as ASCII output. One remaining concern 
for hanzi processing in X is handling hanzi input in application programs. 

Hanzi input processing in X can be done in the X server, in the X client program 
level, or by a special input server program. An input server is a special X client 
program running as a separate process. It implements some input methods, and 
other X client applications can communicate with it by sending input requests and 
receiving converted hanzi characters. 11 

Adding input processing to the X server is undesirable because running Chinese 
processing software should not require a special extended version of the X server. 
On the other hand, input processing can be implemented in the X client program 
level in a simple way. The hanzi input conversion can be made part of the X client 
process, and the hanzi input processing area can be made part of the application 
window. 

The input server approach has its advantage. By separating the input conversion 
processing from the application program, it requires only one input server process 


* When the authors were with the Institute of Software, Academia Sinica, Beijing, China. 
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for multiple applications. However, this approach causes more inter-process com¬ 
munication overhead because the application program has to send hanzi input request 
m thc input server, and the input server has to send the hanzi code back to the 
application Moreover, it is harder to design an input server with a user-fnendly 
interface For example, it would often be necessary for a user to switch his attention 
between the Application window and the input server window because hanzi mpu 
reauires user interaction in the input server window when choosing the desired hanzi 
from a choice-list of candidates. It is unacceptable if the user has to move the mouse 
cursor back and forth between two windows. A user-fnendly input server should be 
carefully designed to avoid this. Also, having many application programs sharing 
one nout server raises the problem of concurrent access to the input server window 
Since each application has its own independent input context in the server, the input 
serve/has ^guarantee that hanzi input for several applications is routed properly 
In^ummary! to avoid the complexity of the input server approach, we choose to do 

exterm a Chinese terminal emulator, and let it take care of hanzi input/output. The 
conversion of a keystroke sequence to the correspondmg hanzi c<^ 
and is hidden from the application program running in exterm. This approach proviaes 
a user friendly character-based human-computer interface for many appheauon pro- 
„rams Existing character terminal based programs operating only onASCIIcharac 
fenT'can run *in exterm without modifications. These programs can be modified to 
ters can run 1 Chinese characters For traditional screen-onented pro- 

shemSatTS in extern, is simpliiied by .h= hi 8 h- 

th We have also^considered budding a hanzi input library, comparable to the standard 
Asm innut library so that any X client application program can be linked wit 
Srary Vhe librmv contains routines to convert an input keystroke sequence to 
hanzi code. The application calls the library routines during run-time whenever to ■ 

SSSSfS 

exterm instead of building a hanzi input/output library. 
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character set could be GB, Big-5, or any other with a two-byte code of the form 
Oxxxxxxx xxxxxxxx, where x is 0 or 1. 

Secondly, cxterm should provide efficient and flexible hanzi input handling. It 
should support multiple hanzi input methods, allow the user to switch among them, 
and allow him to define and add any input method without modifying the cxterm 
program. 

Moreover, cxterm should support all the functionalities of the ASCII terminal 
emulator xterm. namely: repainting following exposure, resizing and scrolling of the 
cxterm window, and cutting and pasting text among many cxterm windows. It should 
also allow the use of different hanzi fonts. 

Finally, cxterm should be self-contained. That is. it should not depend on any 
extension of the X window systems, such as the internationalization extension in 
XI! Release 5. 


USER INTERFACE 

Cxterm now runs in many brands of workstations running XI1 Release 4 and Release 
5. Figure 1 shows the appearance of the cxterm window when cxterm is invoked. It 
is similar to xterm, except that the bottom part of the window has been marked off 
by a horizontal line and serves as the hanzi input processing area ( input area for 
short). The input area is not a sub-window. The user can input hanzi when the 
mouse pointer is anywhere inside the cxterm window. The cxterm program knows 
that only the portion of the window above the horizontal line is the emulated 
terminal area. 



Figure I. cxterm window 
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Choosing hanzi input methods mode , as indicated by 

When a exterm window is created, it 1 1. In this mode, exterm interprets 

the input area prompt messageas see § ding ASCI1 characters are entered 

keystrokes exactly as does x ■ m : na i * The application program currently 

into the input stream of the emu acters The program could be a shell, the 
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mechanism to implement the t f rm window when the input mode is changed 
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currently in the pinyin input mode. 
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Figure 2. Data flow of the user keystroke 
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simply call it the input stream. 




















Figure 3. exterm window in pinyin inpul mode 


Choosing the desired hanzi from the choice-list 

Figure 4 shows the input area immediately after the user presses ‘a’ as the first 
keystroke in pinyin input mode. There are many hanzi with a pinyin sequence 


Figure 4. Input area of exterm when ‘a' is pressed 

starting with ‘a’. These hanzi form' a choice-list ready to be displayed for the user 
to choose the desired hanzi. The maximum number of choices displayed in the input 
area, say n, is limited by the exterm window width. We have tried the default value 
of n = 7 because studies had shown that people can handle about seven chunks of 
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Table I. 

Switching control key 

Input method 

FI 

ASCII 

F2 

internal representation code 

F3 

pinyin tone 

F4 

pinyin 

F5 

double-bvte ASCII >vmbol 

Fn 

double-bvte punctuation symbol 

F7 

first- and last-stroke encoding 

Fit 

telegraph code 

F9 

Cantonese pinvin 
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data at a time.” However, we 

widely, and most users ca ^f . S f e b ifying t he desired maximum number in the 
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TEXTUAL AND COMPILED INPUT TABLES 

A user can specify a hanzi input method in an input table definition file, say with 
file name ‘inputMethodName.TIT, where TIT stands for Textual Input Table. At 
run-time, exterm loads the input table of the current input method into memory. 
The user's keystroke sequences are then matched against this input table. Figure 6 
gives the essential details of the 'pinyin.TIT' file for the pinvin input method. The 
following paragraphs explain the format of the TIT file. 

The keyword 'ENCODE:' defines the character set being used, e.g. 'GB' or 'Big5\ 

Ihe keyword 'PROMPT:’ introduces the input area prompt message. The message 
is displayed in the first line of the input area to remind the user what the current 
input method is. 

The keyword 'SELECTKEY:' defines the selection keys that are used to choose one 
hanzi from a list of many alternatives. Note that the first token specified is ‘1\040’. 
This means that in order to select the first choice, the user can press either the key 
1' or the space-bar (the ASCII space character has octal value 040). The other 
specified selection keys are separated by spaces. If the ‘SELECTKEY:’ line is not 
specified, the result is the same as if ‘SELECTKEY: 1\040 2 3 4 5 6’ were entered. 
The user can specify more or fewer selection keys; e.g. the line ‘SELECTKEY: 
’040 2 3 4 5 6 7 8 0’ defines 10 selection keys. 

Ihe keywords ’MOVERIGHT:' and ‘MOVELEFT:’ define the choice-list traversal keys 
for the user to traverse the choice-list. 

The set of ASCII characters that form valid keystroke sequences for a certain input 
method is called the acceptable alphabet of the input method. In the TIT file, the 
keyword 'VALIDINPUTKEY:' is used to define such an alphabet. 

The following part of the TIT file is the dictionary section of the input method. 
Introduced by the keyword ‘BEGINDICTIONARY’, this section contains all valid key¬ 
stroke sequences; and for each sequence, there is a corresponding choice-list of 
hanzi. 

Owing to the large character set, a choice-list normally consists of more than one 




ENCODE: 03 
PROMPT: R^tSl 
SELECTKEY: 
MOVERIGHT: 
H0VELEFT: 
VALIDINPUTKEY 
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Figure 6. Pan of "pinyin.TIT" file 
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hanzi A good input method always tries to minimize the lengths of the choice-lists, 
i e to reduce the number of duplicated hanzi matching a certain keystroke sequence^ 

' For run-time efficiency, the user must use the utility program tit2cit to «mvert 
TIT file -inputMethodName.TIT to a well-organized internal format, called Com- 
piled lnpia Table (CIT), in the ‘inputMethodName.CIT file. Cxterm reads the a 
file not the TIT file, to load the actual input table into memory. In short T 
designed for the user to specify and maintain an input method while CIT is designe 
for cxterm to use. For completeness, another utility program cit2tit is developed 
convert a CIT file to the corresponding TIT file. 

IMPLEMENTATION 

The nroeram cxterm is derived from xterm. As discussed in the second section, cxterm 
uses P first byte high-bit-on to identify a hanzi, and no escape character sequence is 

used for switching between character sets. . 

xterm requires a normal ASCII font', cxterm requires an extra normal hanzifonL 
xterm supports an optional bold ASCII font- cxterm also supports an optional bold 
ban A font These two optional fonts can be used independently of each other A 
characteristic of hanzi is that the width of the character images in a font is constant. 
Thus the normal and bold hanzi images should have the same width. For nice- 

looking display, the character image w ‘ dth of the hanz ‘ f °" ts f f ‘^ould be u ed 
nf the ASCII fonts This also implies that fixed width ASCII fonts should De useu. 

Sr Sout processing whenever the user presses a key, X traps it and notifies 
cxterm as a^ keystroke^vent. In hanzi input mode, cxterm captures the keystroke 
character in a buffer, and matches the buffer against the current input table. When 
u t _L anH thp nspr tvnes a selection key, it means that a hanzi is successfully 

.re obtained iron, the input table 

Td enmed too tie input stream. The two-b,t, nodes in the ,npu. i.ble ba.e the 

liigSlsissi 

&„sed previously, then J, OUngt££“I 

rnp'ut meihod? the keystroke buffer is cleared. Then cxterm captures any new keys¬ 
trokes in the buffer and matches the buffer against the new CIT. 


Data structures 

The data structure of the hanzi input table in cxterm is carefully designed to be 
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Figure 7. The trie structure of 'pinxin.TlT 


branch in the trie, with the first character in the sequence being the start node of 
the branch (which is one of the first level nodes below the root), and the last 
character in the sequence being the end node of the branch (which is either a leaf 
or an internal node in the trie). The end node of the branch contains a pointer to 
the choice-list of hanzi corresponding to the valid keystroke sequence. 


Matching valid keystroke sequences against the input table 

The input processing of exterm is independent of the input methods. New input 
methods can be independently incorporated, without modifying the program code 

of exterm. 

This is achieved by employing a generic trie traversal routine. No matter from 
which CIT file the in-core trie is read and built, the trie has the same structure. 
Therefore exterm simply maintains a pointer to the current node of the in-core trie. 
At any moment during hanzi input mode, the keystroke character buffer is in one 
of two states: start state, when the buffer is empty, or matched state, when the buffer 
matches either a valid keystroke sequence or a prefix substring of some valid 
keystroke sequences. Correspondingly, when the buffer is in the start state, the trie 
traversal pointer points to the root, and when the buffer is in the matched state, it 
points to a node other than the root. 

A design decision we made was to allow the user to choose a hanzi after any 
number of valid keystrokes are entered. In the matched state, the pointer points to 
a non-root node. Conceptually, exterm has to merge all the choice-lists of this node 
and its descendants to form a merged choice-list. Actually, by using the data structures 
shown in Figure 8, no such merging is needed. We store all the hanzi choice-lists in 
an array, and each trie node has two pointers pointing to the start and end of the 
merged choice-list of the node. In this way, hanzi input based on partial input of a 
valid keystroke sequence is possible. 

As an example, for the pinyin input method, an input keystroke ‘a’ in the keystroke 
buffer causes merging of the choice-lists of valid keystroke sequences ‘a’, ‘ai’, ‘an’, 
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Miscellaneous processing 

In copying characters from a window exterm has to pncurp f u ot . X, , * 

Of a selected text always falls on the firs’t byte ofahanziIf V ^ ^ 

po,m ,o ,he secojd bv.e of a hnnzi.'£ 

SS *• - -*» »»'>■ StttS 

of a hanzi'w^n" 0f U ?‘ nd r"' a ' S ° be a P roblem «f a border falls in the middle 

»e,h„hS,'^^ 

‘Or the casc of hanzi m a!l «erm actions and handle them properly. 

SOME USEFUL PROGRAMS RUNNING IN CXTERM 
Cxterm provides a satisfactory solution to the hanzi input/output problem of CUI 

r.s?~s 

point™ tf, byttTof T” 

moving ,he poimee , he buff., fhe^iT'Safb^ ,h7»«,t,e 7“i 

editor vi ° f "« 

convert ex Rrina rffl'h i' ^ ° WS th l at W ' th the < u PP° rt of cxterm. it is easv to 

convert existing CUI-based programs to handle hanzi. 

COMPARISON WITH RELATED WORK 

I lSSxfS? HanZi - In H?“ Packa "' 3 

is useTfor 6 ^ ^ ^ ^ ™ P™*™ 

termtnTleU^tt't^at^cnr 8 ' 3 ^ °7 inal - y built t0 Serve aS a Ja P anese kanji 
terminal emulator. Kanji code input in kterm is performed through a separate inn,it 

h^ r P h r ° gram ’ T Ut ’ AIth ° U S h the recent version of kterm supportsthe2j!5 
httnzi httnz 1 input has not yet been incorporated into kinput P ^ 

oth mlxterm and kterm adopt the input server approach. In addition to the 
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communication overhead, the quality of the implementation of the input servers has 
large influence on the performance of the terminal emulators. For example, to input 
hanzi in an mlxterm window, the user must press some control key to set the mlxterm 
window to a certain input mode (called MLR mode), and move the mouse to the 
input server window. In the input server window, the user must select an input 
method first, and then type in the keystrokes for a hanzi according to the rules of 
the input method. It is cumbersome to do hanzi input if more than one mlxterm 
window is opened on the same screen, since input request by one mlxterm window 
automatically cancels the input mode of another mlxterm, and changing input method 
in one application implicitly forces all other applications to use the same input 
method. Also, it is not user-friendly because the user has to move the mouse back 
and forth, and turn some mode on and off in order to enter a hanzi. 

The kanji conversion in kinput is better designed. When the conversion starts, 
kinput pops up a conversion window below the bottom side of the kterm window. To 
avoid mouse pointer movement, kinput also creates a transparent input-only window 
lying on top of the kterm window, and effectively grabs any input keystrokes in the 
kterm window. However, there is the overhead of having a window popped up each 
time whenever kanji input is required. In contrast, input processing in different 
exterm windows is independent of each other and is efficient. 

Wnn is an input server that performs the conversion of a keystroke sequence to 
kanji, according to some pre-defined rules. 17 There is also a Chinese version of Wnn 
called cWnn, which converts a keystroke sequence to hanzi. Wnn/cWnn defines a 
set of conversion protocol and interface routines that an application program must 
follow in order to perform any input. To use Wnn/cWnn, one has to modify an 
application program and link with a special library for Wnn/cWnn. As pointed out 
before, a sophisticated input server can be huge and complex. For the Wnn/cWnn 
input server, the size of the source code alone is about 1800 kilobytes. In comparison, 
the built-in hanzi input processing module of exterm is much simpler and more 
elegant; the size of the source code of the input module is only about 36 kilobytes. 

There are internationalization (known as I18N) extensions of X libraries in X11R5 
that support input methods. 111 However, different implementations of input methods 
may result in different I18N extensions of the X libraries. In contrast, exterm is only 
a client program and it requires no special I18N extension to the X libraries. 

Emacsis a widely-used extensible editor and it can be run in an X window. 
There is a Japanese extension of emacs, called nemacs. The input processing in 
nemacs is through the Wnn input server. There are requests to add hanzi processing 
capabilities to emacs. However, emacs is a large piece of software of more than 10 
megabytes of source code, and modifying it is not an easy task. Up till the time this 
paper is written, we have not heard of any successful incorporation of hanzi input/out¬ 
put processing directly into emacs. In contrast, a small patch of code of about five 
kilobytes for emacs has been developed. The code is mainly to make emacs eight- 
bit clean, and will be officially released with emacs version 19. The user can run this 
patched emacs in a exterm window and use it to edit hanzi. 

CONCLUDING REMARKS 

The exterm approach described in this paper overcomes the drawbacks of many input 
server approaches to Chinese input processing. The input processing in exterm is 







824 


M.-C. PONG AND Y. ZHANG 


efficient and user-friendly. The separation of the specification of input methods from 
input processing provides flexibility for the users to develop and use their favourite 
input methods. This is achieved by dynamic loading of compiled input tables 

Although our work is targeted at Chinese language processing, the approach 
employed should be useful in other bilingual processing systems. For example the 
input methods for inputting Japanese or Korean characters can be specified in TIT 
hies, as done for Chinese characters. Also, exterm support the use of different font 
riles. As long as two-byte internal codes are used to represent Japanese or Korean 
characters and to index into the corresponding font file to retrieve the character 
images, we do not need to modify exterm for displaying Japanese or Korean charac- 
tors. 

We have released exterm in 1991 as free software.’ A future enhancement of 
exterm should involve extending hanzi input processing to support input of multi- 
nanzi phrases. This would speed up the input process because inputting one hanzi 
at a time is too slow. Since many Chinese phrases are frequently used one can 
dehne some valid keystroke sequences to represent these Chinese phrases Then the 
input processing module can capture the user keystroke sequence in a buffer and 
search lor the corresponding choice-list of hanzi phrases. The user then selects the 
desired phrase from the choice-list, and the input processing module puts the internal 
hanzi codes of the selected phrase into the input stream. 

One simple way to incorporate hanzi phrase input without substantial change of 
exterm is to allow the user to define the mapping of a keystroke sequence to a 
choice-hst of phrases in the TIT file. Figure 9(a) shows a portion of such an input 
table. As lor single hanzi, the same TIT file specification is used, except that there 
is one more section for specifying keystroke sequences and the corresponding phrases 
separated by commas. This section is introduced by the keyword ‘BEGINPHRASES’ 
The C1T file corresponding to this TIT file contains the same trie data structure' 
except that the choice-lists contain a mix of single hanzi and hanzi phrases The 
same trie traversal and hanzi selection mechanism can be used. 

The user interface is also similar, as illustrated in Figure 9(b). The user has entered 
the keystroke sequence 'zhong’. The input area shows a choice-list of single hanzi 
and nanzi phrases. If the user presses the selection key ‘4’. exterm will put the 
mternal codes of the chosen two-hanzi phrase into the input stream. This is more 
effective than input hanzi one by one. We are in the process of extending exterm to 
support phrase input in this direction. 
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BEGINDICTIONARY 
zhong H 


BEGINPHRASES 

zhongl + H, tIA, ^fil. + £, + £ 

zhong2 ft* 

zhong3 »¥, SB. 

(a) Part of an input table file with phrases 



(b) Input area of exterm with phrase input 

Figure 9. Hanzi phrases in the input table and in exterm input area 
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SUMMARY 

C does not have exception handling facilities. Errors are handled by examining the value returned by 
each function and signals (conditions reported to the program) are handled by using library functions. 
These approaches lead to a d hoc error-handling techniques and can make programs hard to understand. 
Exceptional C, a superset of C, provides exception handling facilities. Exceptional C integrates the two 
techniques used by C programmers (i.e., status values and signals) to handle errors into one unified 
except?on handling mechanism. In this paper, I review exception handling models, specify the criteria 
used for designing the exception handling facilities in Exceptional C, and then describe these facilities. 
I also illustrate the use of the exception handling facilities with examples. 
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INTRODUCTION 

As programs grow in size, programming complexity is significantly increased by the 
need to handle special cases and unusual or error conditions which crop up with 
startling frequency.' In fact, up to 50-70 per cent of the code in a large system can 
be code for handling errors. Error-handling code is a weak link in the development 
of robust programs: errors are not well understood and error-handling code is not 
well tested. 2 Consequently, as programs grow larger, a standard mechanism tor 
handling exceptions becomes important. 3 . - 

An exception is an error or an event that occurs unexpectedly or infrequently, 
such as division by zero, out-of-memory, or the premature interruption of program 
execution Exception handling covers both the response to the exception and recov¬ 
ery from the error or the infrequent event. An exception mechanism is a language 
control structure that specifies that the standard continuation of a program be 
replaced by an exceptional continuation upon the detection of an exception. Excep¬ 
tion-handling enhances program reliability and fault tolerance 

Several languages (such as PL/I Ada, Mesa, and CLU) provide explicit exception 
handling facilities but many languages (such as Pascal C and FORTRAN) do not 
In languages without exception handling facilities, other methods must be used to 
indicate exceptions and to handle exceptions. The most common method is to have 
functions return ‘status values’ or ‘funny values’. If the value returned by a function 
indicates that an exception has occurred during the function call, then appropriate 
action is taken. The status-value technique 5,6 
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